<html>
<title>Blast Settings</title>
<body style="padding:20px;width:650px;font-family:Times">
<h2>Blast Settings</h2>

<h3>Blast File</h3>

If you are generating a new blast file and don't care what name it has,
don't edit this box. If you plan to use an existing blast file, browse to
it in this box; or, if you wish to generate a new file using a different name
from the default, you can change the name.

<h3>Use Blast File/Generate Blast File</h3>

If you have a blast file you wish to use (e.g., computed on an HPC cluster), select
"Use Blast File", and browse to the file in the "Blast File" box above. Note that the
blast file must be in tabular format and must use the same sequence names as in the
sTCW projects. The "Build Database" function (on the main panel) creates a file 
called 
<br>&nbsp;&nbsp;&nbsp;/projcmp/&lt;your project name&gt;/blastResults/Combined.fasta 
<br>which can be used for the self-blast.
<p>
Otherwise, choose "Generate Blast File", and enter the blast parameters you wish to use. 
The default parameters are for the legacy blastp; if you are using blast+, change
the parameters appropriately. 

<h3>Filter Blast File</h3>

Transcript assembly often generates multiple transcripts which end up producing identical, or
nearly identical protein sequences after ESTScan; these all cluster together and
artificially inflate the size of some clusters. The "Filter Blast File" option removes
these redundant sequences (hence, they will not go into a cluster but will still
be in the database). Only the longest
representative of each redundant set is retained for use in clustering.  
<p>
The parameter "%Identity" is the usual blast identity, while "Max Nonalign" is the
number of amino acids at the protein ends which are allowed to not align. For example,
if %Identity=100 and MaxNonalign=2, a perfect alignment which starts at the 4th amino acid
of one protein will not be considered redundant.  